The Paycheck Protection Program: Progressivity and Tax Effects

README for Data and Code Files

by David Splinter, Eric Heiser, Michael Love, and Jacob Mortenson

/////////////////////////////////
//// Overview
/////////////////////////////////
See do file "Master.do" for the list of STATA code used for this project. 
Output files are noted below and in the online Excel spreadsheet used to produce tables/figures.

This study makes use of confidential taxpayer data. To access these data, researchers 
can apply to the IRS Statistics of Income Joint Statistical Research Program at
www.irs.gov/uac/soi-tax-stats-joint-statistical-research-program.

Code was used for initial extracts of confidential tax data that cannot be posted.
This code can be shared upon request for researchers with access to the tax data and IRS emails.

/////////////////////////////////
//// Base Data Files
/////////////////////////////////
PPP loan data is from the Small Business Association (SBA) “All PPP Loan Data” file, last updated on January 1, 2023.  The data is split into 13 csv files.
SBA data was accessed on February 24, 2023, from https://data.sba.gov/dataset/ppp-foia
Tax data is population data accessed May-July 2023.

/////////////////////////////////
//// Code for extracting and preparing data
/////////////////////////////////
import_ppp.do -- Takes in the raw SBA PPP data as input and condenses into one csv file for matching to the tax data on the CDW and for other analyses

//// SQL code //// 
clean_ppp.txt -- Run PPP data from import_ppp.do through string cleaning procedure to prepare for match. Produces sba_clean.csv
clean_entity_code.txt -- Run business tax data through string cleaning procedure. Produces entities_clean.csv
clean_schk.txt -- Runs additional passthrough data on tax exempt income through string cleaning procedure. Produces schk_loans.csv

save_sba.txt -- Splits SBA data (sba_clean.csv) into 6 files (A_CO,CT_ID,IL_MI,MN_NY,OH_TN,TX_Z) to make segment the matching process to avoid crashes.
save_ent.txt -- Splits business tax data (entities_clean.csv) into 6 files (A_CO,CT_ID,IL_MI,MN_NY,OH_TN,TX_Z)
save_k.txt -- Splits passthrough data (schk_loans.csv) into 6 files (A_CO,CT_ID,IL_MI,MN_NY,OH_TN,TX_Z)

A_CO.txt (CT_ID,IL_MI,MN_NY,OH_TN,TX_Z) -- This is one of six files (across different states) all of the same format that matches PPP observations to tax records 
The posted file is for states A to Colorado (again, other files are identical but for different states). 
This file takes sba_clean_A_CO.csv, entities_clean_A_CO.csv and schk_loans_A_CO.csv (Also reads in Form 1040 Schedule C data). 
Matches are made and saved in order with business tax data on exact name and address (exact_matches_A_CO.csv), fuzzy name and address within zipcode (zip_matches_A_CO.csv), 
fuzzy name and address with city (city_matches_A_CO.csv), fuzzy name and address within county (county_matches_A_CO.csv), fuzzy name and address within state (state_matches_A_CO.csv). 
Then exact and fuzzy matches with the schedule K data on name and address (k_matches_A_CO.csv).  Finally, exact and fuzzy matches with the schedule C data (c_matches_A_CO.csv)

make_unique.txt -- This uses code with confidential items to clean up some overlap in Schedule C matches. Takes as input all of the c_matches.csv files (A_CO,CT_ID,IL_MI,MN_NY,OH_TN,TX_Z) and outputs sch_c_u.csv.

f941_by_year.txt -- Extracts form 941 data and saves as f941_year.csv for 2018-2021

/////////////////////////////////
//// Code for cleaning and preparing data
/////////////////////////////////
make_globals.do -- Read in the global variables: 
SQL code (above) --  Produces matches between the PPP data from SBA and the tax data from CDW
cleaning/import_matches.do -- Read in and combine all PPP matches from the CDW
SQL code -- Pulls quarterly Form 941 data (year, quarter, firm EINs, wages, employment counts)
cleaning/import_from_cdw.do -- Combine Form 941 data with PPP data and reformat tax and PPP data for use in regressions and distributional analysis (output: ppp_wide_dist)
SQL code -- Produces distributions of wages, unemployment compensation, and ownership shares of firms. Owner tax rates calculated here.
cleaning/distributional_data_2.do -- Read in distributional data and combine with PPP data (output: data/distributional_data)

//////////////////////////////////
///// Regression analysis code
//////////////////////////////////
analysis/reg_data.do -- Make sample restrictions and other preparations of the regression data 
analysis/f941_cells_regs.do -- Perform the main regression analysis (figures 2,A6)
analysis/wage_table.do -- Output main regression results into a table (Table 1)

//Robustness checks
analysis/f941_restaurant_regs.do -- Perform regression analysis for restaurants (figure A5)
analysis/f941_nopp_regs.do -- Perform regression analysis using never-treated as control (figure A7)
analysis/f941_regs_nynj.do -- Perform regression analysis for New York and New Jersey separately (figure A8)
analysis/f941_cells_regs_autor.do -- Perform regression analysis for comparison with Autor et al.

///////////////////////////////////////////////////
///// Various Summary Statistics, Tables and Figures
///////////////////////////////////////////////////
analysis/ppp_tabs.do -- PPP loan statistics
analysis/sum_stats_owner_match.do -- Statistics on firm owners and scale-up factors
analysis/summarize_941_matches.do -- Statistics on form 941 and scale-up factors (figures A2,A4,A9)
analysis/summarize_matches.do -- Summarize matches (Tables A1,A2)
analysis/summarize_w2_matches.do -- Summarize W-2 matches and scale-up factors
analysis/threshold_500_line_plot.do -- Produce appendix figure A1: uptake rates by firm size
analysis/sba_schk.do -- Compares observed PPP amounts in tax data with matches (see appendix 2,figure A2)

//////////////////////////////////
///// Distributional analysis code
//////////////////////////////////
analysis/explore_ui_1.do -- Perform the estimate of amounts of avoided unemployment compensation from the PPP (figures 4,A10)
analysis/distribution_and_tax.do -- Output the distributional estimates (figures 3,4,A3,A10)

//////////////////////////////////
///// Data Output
//////////////////////////////////
See the online Excel spreadsheet for data outputs.